Attention-based Memory Selection Recurrent Network for Language Modeling
نویسندگان
چکیده
Recurrent neural networks (RNNs) have achieved great success in language modeling. However, since the RNNs have fixed size of memory, their memory cannot store all the information about the words it have seen before in the sentence, and thus the useful longterm information may be ignored when predicting the next words. In this paper, we propose Attention-based Memory Selection Recurrent Network (AMSRN), in which the model can review the information stored in the memory at each previous time step and select the relevant information to help generate the outputs. In AMSRN, the attention mechanism finds the time steps storing the relevant information in the memory, and memory selection determines which dimensions of the memory are involved in computing the attention weights and from which the information is extracted. In the experiments, AMSRN outperformed long short-term memory (LSTM) based language models on both English and Chinese corpora. Moreover, we investigate using entropy as a regularizer for attention weights and visualize how the attention mechanism helps language modeling.
منابع مشابه
A Sentence Interaction Network for Modeling Dependence between Sentences
Modeling interactions between two sentences is crucial for a number of natural language processing tasks including Answer Selection, Dialogue Act Analysis, etc. While deep learning methods like Recurrent Neural Network or Convolutional Neural Network have been proved to be powerful for sentence modeling, prior studies paid less attention on interactions between sentences. In this work, we propo...
متن کاملInner Attention based Recurrent Neural Networks for Answer Selection
Attention based recurrent neural networks have shown advantages in representing natural language sentences (Hermann et al., 2015; Rocktäschel et al., 2015; Tan et al., 2015). Based on recurrent neural networks (RNN), external attention information was added to hidden representations to get an attentive sentence representation. Despite the improvement over nonattentive models, the attention mech...
متن کاملFeedforward Sequential Memory Neural Networks without Recurrent Feedback
We introduce a new structure for memory neural networks, called feedforward sequential memory networks (FSMN), which can learn long-term dependency without using recurrent feedback. The proposed FSMN is a standard feedforward neural networks equipped with learnable sequential memory blocks in the hidden layers. In this work, we have applied FSMN to several language modeling (LM) tasks. Experime...
متن کاملAutoregressive Attention for Parallel Sequence Modeling
We introduce an autoregressive attention mechanism for parallelizable characterlevel sequence modeling. We use this method to augment a neural model consisting of blocks of causal convolutional layers connected by highway network skip connections. We denote the models with and without the proposed attention mechanism respectively as Highway Causal Convolution (Causal Conv) and Autoregressive-at...
متن کاملDetecting Hazardous Events from Sequential Data with Multilayer Architectures
Multivariate time series data play an important role in many domains, including real-time monitoring systems. In this paper, we focus on multilayer neural architectures that are capable of learning high level representations from raw data. This includes our previous solution based on Recurrent Neural Networks with Long Short-Term Memory (LSTM) cells. We build upon this work and present improved...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1611.08656 شماره
صفحات -
تاریخ انتشار 2016